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Porcine deltacoronavirus (PDCoV) is a newly identified coronavirus of pigs that was first reported in Hong Kong 
in 2012. Since then, many PDCoV isolates have been identified worldwide. In this study, we analyzed the codon 
usage pattern of the S gene using complete coding sequences and complete PDCoV genomes to gain a deeper 
understanding of their genetic relationships and evolutionary history. We found that during evolution three 
groups evolved with a relatively low codon usage bias (effective number of codons (ENC) of 52). The factors 
driving bias were complex. However, the primary element influencing the codon bias of PDCoVs was natural 
selection. Our results revealed that different natural environments may have a significant impact on the genetic 
characteristics of the strains. In the future, more epidemiological surveys are required to examine the factors that 
resulted in the emergence and outbreak of this virus. 


1. Introduction 

Coronaviruses (CoVs) are the causative agents of major diseases in a 
variety of avian and mammalian species including humans. CoVs be¬ 
long to the subfamily Orthocoronavirinae of the Coronaviridae, order 
Nidovirales. The Orthocoronavirinae subfamily is further divided into 
four genera including, Alphacoronavirus, Betacoronavirus, 
Gammacoronavirus, and the recently identified Deltacoronavirus (Chan 
et al., 2013; King et al., 2018). To date, six CoVs have been reported in 
pigs: transmissible gastroenteritis virus (TGEV), porcine respiratory 
coronavirus (PRCV), swine enteric alphacoronavirus (SeACoV), porcine 
epidemic diarrhea virus (PEDV), porcine hemagglutinating en¬ 
cephalomyelitis virus (PHEV), and porcine deltacoronavirus (PDCoV) 
(Pan et al., 2017; Homwong et al., 2016). PDCoV was first recorded as 
an emerging enteropathogenic coronavirus in pigs in Hong Kong in 
2012 (Chan et al., 2013; Woo et al., 2012), and thereafter was isolated 
from a swine farm in Ohio, USA in 2014 (Wang et al., 2014a). Since 
then, PDCoV has been reported in many countries and regions, in¬ 
cluding USA, Canada, South Korea, mainland China, Mexico, Japan, 
Thailand, Viet Nam, and Lao PDR (Lee and Lee, 2014; Suzuki et al., 


2018; Saeng-Chuto et al., 2017; Wang et al., 2014b; Ajayi et al., 2018; 
Perez-Rivera et al., 2019). A previous study showed that the global 
PDCoVs consist of the China lineage, the USA/Japan/South Korea 
lineage, and the Viet Nam/Laos/Thailand lineage (Zhang et al., 2019). 
PDCoV is an enveloped, positive-sense, and single-stranded RNA virus 
with a genome size of approximately 25.4 kb. The genome includes a 
5'UTR, ORFla/lb, the spike (S), the envelope (E), the membrane (M), 
nonstructural protein 6 (NS6), the nucleocapsid (N), the nonstructural 
protein 7 (NS7), and a 3'UTR (Lee and Lee, 2014). 

The codon usage pattern is an important indicator of genome evo¬ 
lution. Except for methionine and tryptophan, more than one codon can 
encode an amino acid due to the redundancy of the genetic code. 
Codons encoding the same amino acid also are known as synonymous 
codons. Interestingly, the codon usage is not random and some codons 
are used more than others, a phenomenon referred to as codon usage 
bias (Marin et al., 1989). Codon usage bias has been reported for some 
RNA viruses. However, the degree of bias varies depending on the 
identity of the specific virus. For instance. Rubella virus and Rotavirus 
show strong codon usage biases, whereas Equine infectious anemia 
virus (EIAV), Ebola virus (EBOV), the N gene of Rabies virus (RABV), 
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Fig. 1. (A) Maximum likelihood tree of the PDCoV S gene reconstructed by RAxML (v8.2.10). (B) Bayesian Inference tree of the PDCoV S gene reconstructed by 
MrBayes (v3.2.7). The China group, USA-Japan-Korea group, and Thailand-Early China-Vietnam group are represented in light blue, green, and pink, respectively. 
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Fig. 2. Principal component analysis (PCA) of the PDCoV S gene (A) and complete coding genomes (B). The China, USA-Japan-Korea, and Thailand-Early China- 
Vietnam groups are represented in light blue, green, and pink, respectively. 


and Porcine epidemic diarrhea virus (PEDV) show weak codon usage 
bias (Belalov and Lukashev, 2013; Yin et al., 2013; Cristina et al., 2015; 
Chen et al., 2014; He et al., 2017). Natural selection, mutation pressure, 
the abundance of tRNA, RNA structure, and gene length all contribute 
to the codon usage bias (Jenkins and Holmes, 2003; Parmley and Hurst, 
2007; Hershberg and Petrov, 2008; Plotkin and Kudla, 2011). The virus 
and host can both influence codon usage, which likely affects the sur¬ 
vival, evolution, fitness, and immune evasion of the virus from host 
defenses (Li et al., 2018b, 2019; He et al., 2019). Indeed, synonymous 
triplets are not used randomly, and factors such as natural selection and 
saltatorial bias can cause synonymous codon usage to diverge (Sharp 
and Li, 1986). Investigating the codon usage patterns of viruses could 
provide insights into their molecular evolution and viral gene expres¬ 
sion regulation, assisting vaccine design, in which high levels of viral 
antigen expression are likely to be needed to produce immunity (Butt 
et al., 2014). Given the recent increase in PDCoV epidemics and the 
threat to pork production, in the present study, we reported an ex¬ 
haustive genome-wide investigation of PDCoV codon usage and eval¬ 
uated the possible influencing factors. 


2. Materials and methods 

2.1. Data analysis 

We retrieved all PDCoVs sequences from the National Center for 
Biotechnology Information (NCBI) nucleotide database (http://www. 
ncbi.nlm.nih.gov) available up to April 2019. The detailed sequence 
information (serial number, strain name, accession number, location, 
and isolation year) for all 159 complete coding sequences of the S gene 
and 98 complete coding sequences (with the following concatenated 
order: ORFlab-S-E-M-NS6-N-NS7) of PDCoV are displayed in supple¬ 
mentary materials (Table SI). 

2.2. Recombination and phylogenetic group analysis 

Potential recombination signals were detected using RDP4 
(Recombination Detection Program version 4) (Martin et al., 2015) 
with default settings. Seven methods were chosen for the analysis, in¬ 
cluding RDP, GENECONV, Chimaera, MaxChi, BootScan, SiScan, and 3 
Seq. In particular, four methods were firstly applied. Thereafter, the 
remaining sequences were run again with at least two methods until 
there was no recombination signal. 

Phylogenetic trees were reconstructed in RAxML (v8.2.10) 
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Table 1 

The nucleotide composition and properties of S gene of the PDCoV strains. 


Strain 

A% 

U% 

C% 

G% 

C% + G% 

GCls 

GC2s 

GC12s 

GC3s 

U3s 

C3s 

A3s 

G3s 

ENC 

JQ065042 

0.280 

0.306 

0.237 

0.177 

0.414 

0.478 

0.412 

0.445 

0.351 

0.476 

0.267 

0.308 

0.171 

52.250 

KP757891 

0.279 

0.305 

0.239 

0.177 

0.415 

0.480 

0.413 

0.446 

0.354 

0.471 

0.270 

0.309 

0.172 

52.410 

KP757892 

0.280 

0.305 

0.239 

0.176 

0.415 

0.478 

0.411 

0.444 

0.357 

0.467 

0.276 

0.310 

0.168 

52.430 

KR131621 

0.280 

0.307 

0.237 

0.176 

0.413 

0.476 

0.412 

0.444 

0.352 

0.471 

0.270 

0.311 

0.169 

52.300 

KT336560 

0.278 

0.306 

0.238 

0.178 

0.417 

0.481 

0.413 

0.447 

0.356 

0.477 

0.268 

0.300 

0.176 

52.210 

KU204694 

0.279 

0.306 

0.237 

0.178 

0.415 

0.480 

0.413 

0.447 

0.351 

0.472 

0.267 

0.311 

0.170 

52.060 

KU204695 

0.281 

0.307 

0.236 

0.176 

0.412 

0.477 

0.408 

0.443 

0.352 

0.473 

0.269 

0.310 

0.170 

52.140 

KU204696 

0.279 

0.308 

0.235 

0.178 

0.413 

0.480 

0.408 

0.444 

0.351 

0.475 

0.268 

0.310 

0.169 

52.050 

KU204697 

0.281 

0.307 

0.236 

0.176 

0.412 

0.477 

0.408 

0.443 

0.352 

0.473 

0.269 

0.310 

0.170 

52.140 

KU665558 

0.280 

0.305 

0.238 

0.177 

0.415 

0.478 

0.413 

0.445 

0.353 

0.472 

0.269 

0.309 

0.172 

52.420 

KU981059 

0.280 

0.306 

0.237 

0.177 

0.414 

0.479 

0.413 

0.446 

0.351 

0.474 

0.269 

0.310 

0.168 

52.030 

KX534090 

0.278 

0.307 

0.238 

0.177 

0.415 

0.480 

0.414 

0.447 

0.352 

0.472 

0.268 

0.310 

0.169 

52.100 

KY065120 

0.280 

0.305 

0.239 

0.176 

0.415 

0.478 

0.411 

0.445 

0.356 

0.468 

0.276 

0.310 

0.167 

52.710 

KY078905 

0.280 

0.307 

0.237 

0.176 

0.413 

0.476 

0.413 

0.444 

0.351 

0.474 

0.267 

0.310 

0.171 

52.100 

KY078907 

0.280 

0.305 

0.239 

0.176 

0.415 

0.478 

0.411 

0.444 

0.356 

0.468 

0.276 

0.310 

0.166 

52.610 

KY078909 

0.280 

0.305 

0.239 

0.176 

0.415 

0.479 

0.410 

0.445 

0.356 

0.468 

0.278 

0.308 

0.163 

52.360 

KY078910 

0.280 

0.306 

0.239 

0.176 

0.414 

0.478 

0.410 

0.444 

0.354 

0.470 

0.275 

0.310 

0.165 

52.600 

KY078911 

0.280 

0.306 

0.239 

0.176 

0.414 

0.478 

0.411 

0.444 

0.354 

0.470 

0.275 

0.310 

0.165 

52.580 

KY078914 

0.280 

0.306 

0.239 

0.175 

0.414 

0.477 

0.411 

0.444 

0.354 

0.470 

0.275 

0.310 

0.165 

52.580 

KY293677 

0.279 

0.305 

0.238 

0.178 

0.416 

0.481 

0.413 

0.447 

0.353 

0.471 

0.270 

0.310 

0.169 

52.400 

KY293678 

0.279 

0.306 

0.237 

0.177 

0.415 

0.477 

0.413 

0.445 

0.354 

0.470 

0.271 

0.310 

0.170 

52.550 

KY496312 

0.280 

0.306 

0.237 

0.177 

0.414 

0.477 

0.411 

0.444 

0.353 

0.470 

0.270 

0.311 

0.171 

52.380 

KY513724 

0.280 

0.304 

0.239 

0.177 

0.416 

0.477 

0.412 

0.444 

0.359 

0.466 

0.275 

0.309 

0.172 

53.040 

LC216914 

0.279 

0.306 

0.237 

0.178 

0.415 

0.480 

0.412 

0.446 

0.353 

0.473 

0.269 

0.308 

0.171 

52.130 

MF037204 

0.281 

0.304 

0.239 

0.176 

0.415 

0.475 

0.413 

0.444 

0.357 

0.465 

0.275 

0.313 

0.169 

52.840 

MF041982 

0.279 

0.305 

0.239 

0.177 

0.416 

0.480 

0.413 

0.446 

0.357 

0.470 

0.274 

0.306 

0.170 

52.350 

MF280390 

0.279 

0.306 

0.239 

0.176 

0.415 

0.478 

0.411 

0.444 

0.356 

0.468 

0.276 

0.309 

0.166 

52.690 

MF431742 

0.278 

0.307 

0.237 

0.178 

0.415 

0.479 

0.413 

0.446 

0.354 

0.477 

0.267 

0.301 

0.174 

52.150 

MF431743 

0.280 

0.302 

0.241 

0.177 

0.418 

0.482 

0.413 

0.448 

0.358 

0.466 

0.274 

0.310 

0.171 

52.960 

MF461406 

0.281 

0.307 

0.236 

0.176 

0.413 

0.476 

0.408 

0.442 

0.353 

0.470 

0.269 

0.312 

0.172 

51.910 

MF461408 

0.280 

0.306 

0.237 

0.177 

0.414 

0.475 

0.413 

0.444 

0.353 

0.471 

0.269 

0.309 

0.171 

52.300 

MF461409 

0.280 

0.307 

0.237 

0.177 

0.414 

0.475 

0.413 

0.444 

0.353 

0.471 

0.269 

0.309 

0.171 

52.350 

MF948005 

0.281 

0.304 

0.240 

0.176 

0.415 

0.475 

0.413 

0.444 

0.357 

0.464 

0.275 

0.314 

0.168 

52.860 

MG242062 

0.279 

0.304 

0.239 

0.177 

0.416 

0.479 

0.413 

0.446 

0.356 

0.468 

0.272 

0.310 

0.172 

52.880 

MG832584 

0.280 

0.306 

0.237 

0.177 

0.414 

0.476 

0.412 

0.444 

0.354 

0.472 

0.269 

0.308 

0.173 

52.330 

MH708123 

0.279 

0.306 

0.237 

0.177 

0.415 

0.479 

0.411 

0.445 

0.354 

0.472 

0.269 

0.308 

0.173 

52.560 

MH708124 

0.279 

0.306 

0.237 

0.177 

0.415 

0.479 

0.411 

0.445 

0.354 

0.472 

0.269 

0.308 

0.173 

52.560 

MH708125 

0.279 

0.306 

0.237 

0.177 

0.415 

0.479 

0.411 

0.445 

0.354 

0.472 

0.269 

0.308 

0.173 

52.560 

MH715491 

0.280 

0.306 

0.239 

0.176 

0.414 

0.479 

0.411 

0.445 

0.352 

0.472 

0.273 

0.310 

0.165 

52.360 

MK248485 

0.280 

0.306 

0.237 

0.177 

0.414 

0.478 

0.411 

0.444 

0.354 

0.472 

0.269 

0.308 

0.173 

52.510 

NC_039208 

0.277 

0.304 

0.240 

0.178 

0.418 

0.481 

0.413 

0.447 

0.361 

0.469 

0.275 

0.301 

0.175 

52.360 

LC260038 

0.279 

0.304 

0.240 

0.177 

0.417 

0.475 

0.412 

0.444 

0.364 

0.464 

0.279 

0.306 

0.173 

52.730 

LC260039 

0.279 

0.303 

0.241 

0.177 

0.418 

0.477 

0.412 

0.444 

0.365 

0.462 

0.281 

0.305 

0.173 

52.810 

LC260040 

0.279 

0.303 

0.241 

0.177 

0.418 

0.477 

0.411 

0.444 

0.365 

0.462 

0.281 

0.305 

0.173 

52.870 

LC260041 

0.279 

0.303 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.364 

0.464 

0.279 

0.305 

0.173 

52.710 

LC260042 

0.280 

0.303 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.363 

0.464 

0.278 

0.307 

0.173 

52.750 

LC260043 

0.280 

0.304 

0.240 

0.176 

0.416 

0.475 

0.412 

0.444 

0.362 

0.465 

0.279 

0.307 

0.171 

52.520 

LC260044 

0.280 

0.303 

0.241 

0.177 

0.417 

0.476 

0.412 

0.444 

0.364 

0.463 

0.280 

0.307 

0.172 

52.650 

LC260045 

0.280 

0.301 

0.242 

0.177 

0.419 

0.477 

0.413 

0.445 

0.367 

0.457 

0.285 

0.308 

0.171 

52.700 

KJ462462 

0.280 

0.303 

0.241 

0.176 

0.417 

0.477 

0.412 

0.444 

0.363 

0.464 

0.280 

0.307 

0.171 

52.660 

KJ481931 

0.280 

0.303 

0.240 

0.177 

0.417 

0.477 

0.412 

0.444 

0.362 

0.465 

0.277 

0.307 

0.173 

52.680 

KJ567050 

0.279 

0.304 

0.240 

0.177 

0.417 

0.477 

0.411 

0.444 

0.363 

0.465 

0.279 

0.305 

0.173 

52.520 

KJ569769 

0.279 

0.303 

0.241 

0.177 

0.418 

0.477 

0.414 

0.445 

0.364 

0.462 

0.280 

0.307 

0.171 

52.750 

KJ584355 

0.279 

0.304 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.362 

0.466 

0.277 

0.305 

0.173 

52.690 

KJ584356 

0.280 

0.303 

0.240 

0.177 

0.417 

0.475 

0.411 

0.443 

0.364 

0.464 

0.280 

0.306 

0.172 

52.620 

KJ584357 

0.279 

0.303 

0.240 

0.177 

0.417 

0.476 

0.413 

0.444 

0.363 

0.464 

0.278 

0.306 

0.173 

52.760 

KJ584358 

0.279 

0.303 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.364 

0.464 

0.279 

0.305 

0.173 

52.690 

KJ584359 

0.279 

0.304 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.363 

0.465 

0.278 

0.305 

0.173 

52.720 

KJ601777 

0.279 

0.304 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.362 

0.465 

0.278 

0.306 

0.172 

52.720 

KJ601778 

0.279 

0.304 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.362 

0.465 

0.278 

0.306 

0.172 

52.780 

KJ601779 

0.280 

0.303 

0.240 

0.177 

0.417 

0.477 

0.412 

0.444 

0.362 

0.465 

0.277 

0.307 

0.173 

52.680 

KJ601780 

0.279 

0.303 

0.241 

0.177 

0.418 

0.477 

0.412 

0.445 

0.365 

0.463 

0.280 

0.305 

0.173 

52.880 

KJ620016 

0.280 

0.304 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.362 

0.465 

0.278 

0.307 

0.172 

52.600 

KJ769231 

0.280 

0.304 

0.240 

0.176 

0.416 

0.476 

0.411 

0.444 

0.361 

0.466 

0.278 

0.307 

0.171 

52.550 

KM012168 

0.279 

0.304 

0.240 

0.177 

0.417 

0.474 

0.414 

0.444 

0.363 

0.465 

0.278 

0.304 

0.174 

52.600 

KP981395 

0.279 

0.304 

0.240 

0.177 

0.417 

0.474 

0.414 

0.444 

0.363 

0.465 

0.278 

0.304 

0.174 

52.600 

KP995358 

0.280 

0.303 

0.241 

0.176 

0.417 

0.477 

0.411 

0.444 

0.363 

0.464 

0.280 

0.307 

0.171 

52.710 

KR150443 

0.280 

0.304 

0.239 

0.177 

0.416 

0.475 

0.412 

0.444 

0.361 

0.466 

0.277 

0.305 

0.173 

52.640 

KR265847 

0.279 

0.304 

0.241 

0.177 

0.417 

0.477 

0.411 

0.444 

0.364 

0.465 

0.280 

0.304 

0.172 

52.740 

KR265848 

0.279 

0.304 

0.240 

0.177 

0.416 

0.476 

0.410 

0.443 

0.363 

0.465 

0.280 

0.305 

0.172 

52.740 

KR265849 

0.280 

0.303 

0.241 

0.176 

0.417 

0.476 

0.413 

0.444 

0.363 

0.463 

0.280 

0.307 

0.171 

52.590 

KR265850 

0.280 

0.303 

0.241 

0.176 

0.417 

0.476 

0.413 

0.444 

0.363 

0.463 

0.280 

0.307 

0.171 

52.590 

KR265851 

0.279 

0.304 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.364 

0.464 

0.279 

0.305 

0.173 

52.720 

KR265852 

0.279 

0.304 

0.240 

0.177 

0.417 

0.476 

0.411 

0.444 

0.363 

0.465 

0.278 

0.305 

0.173 

52.710 


(continued on next page ) 
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Table 1 ( continued ) 


Strain 

A% 

U% 

C% 

G% 

C% + G% 

GCls 

GC2s 

GC12s 

GC3s 

U3s 

C3s 

A3s 

G3s 

ENC 

KR265853 

0.279 

0.303 

0.241 

0.177 

0.417 

0.476 

0.411 

0.444 

0.365 

0.462 

0.281 

0.306 

0.173 

52.850 

KR265854 

0.279 

0.304 

0.240 

0.177 

0.417 

0.475 

0.412 

0.444 

0.364 

0.463 

0.280 

0.306 

0.172 

52.790 

KR265855 

0.279 

0.304 

0.240 

0.177 

0.417 

0.475 

0.411 

0.443 

0.364 

0.463 

0.280 

0.306 

0.172 

52.740 

KR265856 

0.279 

0.303 

0.240 

0.177 

0.417 

0.476 

0.411 

0.444 

0.365 

0.463 

0.280 

0.305 

0.173 

52.820 

KR265857 

0.279 

0.303 

0.240 

0.177 

0.417 

0.476 

0.411 

0.444 

0.365 

0.463 

0.280 

0.305 

0.173 

52.820 

KR265858 

0.280 

0.304 

0.240 

0.177 

0.417 

0.476 

0.411 

0.444 

0.363 

0.465 

0.279 

0.305 

0.172 

52.580 

KR265859 

0.279 

0.303 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.364 

0.464 

0.279 

0.305 

0.173 

52.710 

KR265860 

0.279 

0.304 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.363 

0.465 

0.278 

0.305 

0.173 

52.720 

KR265861 

0.279 

0.304 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.363 

0.465 

0.278 

0.305 

0.173 

52.720 

KR265862 

0.279 

0.303 

0.241 

0.177 

0.417 

0.477 

0.413 

0.445 

0.363 

0.464 

0.279 

0.307 

0.171 

52.760 

KR265863 

0.280 

0.303 

0.241 

0.176 

0.417 

0.477 

0.412 

0.444 

0.362 

0.464 

0.280 

0.307 

0.170 

52.590 

KR265864 

0.280 

0.303 

0.240 

0.176 

0.416 

0.475 

0.410 

0.443 

0.364 

0.464 

0.280 

0.305 

0.172 

52.680 

KR265865 

0.280 

0.304 

0.240 

0.176 

0.416 

0.474 

0.412 

0.443 

0.363 

0.464 

0.278 

0.306 

0.174 

52.660 

KT381613 

0.280 

0.304 

0.240 

0.176 

0.417 

0.476 

0.412 

0.444 

0.362 

0.465 

0.279 

0.307 

0.171 

52.530 

KX022602 

0.280 

0.303 

0.240 

0.177 

0.417 

0.476 

0.412 

0.444 

0.363 

0.464 

0.278 

0.307 

0.173 

52.550 

KX022603 

0.280 

0.303 

0.240 

0.177 

0.417 

0.477 

0.412 

0.444 

0.363 

0.464 

0.278 

0.307 

0.173 

52.550 

KX022604 

0.280 

0.303 

0.240 

0.177 

0.417 

0.477 

0.412 

0.444 

0.363 

0.464 

0.278 

0.307 

0.173 

52.550 

KX022605 

0.280 

0.304 

0.240 

0.177 

0.417 

0.477 

0.412 

0.444 

0.361 

0.466 

0.276 

0.307 

0.173 

52.470 

MK478380 

0.280 

0.303 

0.241 

0.177 

0.417 

0.476 

0.412 

0.444 

0.364 

0.463 

0.280 

0.307 

0.172 

52.670 

MK478381 

0.279 

0.304 

0.241 

0.176 

0.417 

0.477 

0.411 

0.444 

0.362 

0.464 

0.280 

0.307 

0.169 

52.640 

MK478382 

0.280 

0.304 

0.241 

0.176 

0.417 

0.476 

0.412 

0.444 

0.362 

0.467 

0.279 

0.304 

0.171 

52.570 

MK478383 

0.279 

0.303 

0.241 

0.176 

0.417 

0.476 

0.413 

0.444 

0.364 

0.464 

0.281 

0.305 

0.170 

52.690 

KM820765 

0.279 

0.303 

0.241 

0.177 

0.417 

0.476 

0.412 

0.444 

0.365 

0.463 

0.280 

0.305 

0.173 

52.730 

KR060082 

0.279 

0.303 

0.241 

0.177 

0.418 

0.477 

0.412 

0.445 

0.365 

0.462 

0.281 

0.305 

0.173 

52.900 

KR060083 

0.279 

0.303 

0.241 

0.177 

0.418 

0.477 

0.413 

0.445 

0.365 

0.463 

0.280 

0.305 

0.173 

52.840 

KX710201 

0.280 

0.305 

0.239 

0.176 

0.415 

0.474 

0.411 

0.443 

0.360 

0.465 

0.277 

0.308 

0.171 

52.670 

KX710202 

0.280 

0.305 

0.239 

0.176 

0.415 

0.474 

0.411 

0.443 

0.361 

0.465 

0.277 

0.307 

0.172 

52.680 

KY354363 

0.280 

0.305 

0.239 

0.176 

0.415 

0.474 

0.411 

0.443 

0.360 

0.465 

0.277 

0.308 

0.171 

52.670 

KY354364 

0.280 

0.305 

0.239 

0.176 

0.415 

0.474 

0.411 

0.443 

0.360 

0.465 

0.277 

0.308 

0.171 

52.670 

KY364365 

0.279 

0.304 

0.240 

0.177 

0.417 

0.474 

0.413 

0.444 

0.363 

0.465 

0.278 

0.304 

0.174 

52.670 

KY926511 

0.279 

0.305 

0.240 

0.176 

0.415 

0.473 

0.412 

0.442 

0.361 

0.465 

0.278 

0.306 

0.172 

52.650 

KY926512 

0.279 

0.306 

0.239 

0.176 

0.415 

0.473 

0.413 

0.443 

0.360 

0.466 

0.277 

0.306 

0.171 

52.540 

KU051641 

0.280 

0.304 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.940 

KU051642 

0.280 

0.304 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.940 

KU051643 

0.280 

0.304 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.474 

0.270 

0.308 

0.169 

52.940 

KU051644 

0.281 

0.303 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.920 

KU051645 

0.281 

0.303 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.920 

KU051646 

0.281 

0.303 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.920 

KU051647 

0.281 

0.303 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.920 

KU051648 

0.281 

0.303 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.920 

KU051649 

0.281 

0.303 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.920 

KU051650 

0.280 

0.304 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.940 

KU051651 

0.280 

0.304 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.354 

0.473 

0.271 

0.308 

0.169 

52.900 

KU051652 

0.280 

0.304 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.353 

0.473 

0.271 

0.309 

0.169 

52.940 

KU051653 

0.280 

0.303 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.354 

0.472 

0.272 

0.309 

0.169 

53.000 

KU051654 

0.280 

0.303 

0.240 

0.176 

0.416 

0.483 

0.412 

0.448 

0.354 

0.472 

0.272 

0.309 

0.169 

53.000 

KU051655 

0.280 

0.305 

0.239 

0.176 

0.415 

0.483 

0.411 

0.447 

0.352 

0.475 

0.270 

0.307 

0.169 

52.940 

KU051656 

0.280 

0.304 

0.239 

0.176 

0.416 

0.483 

0.412 

0.448 

0.352 

0.474 

0.270 

0.309 

0.169 

52.920 

KU984334 

0.280 

0.302 

0.241 

0.177 

0.417 

0.484 

0.413 

0.448 

0.356 

0.469 

0.274 

0.309 

0.168 

53.140 

KX118627 

0.281 

0.303 

0.240 

0.177 

0.416 

0.482 

0.412 

0.447 

0.354 

0.469 

0.273 

0.311 

0.168 

53.210 

KX834351 

0.279 

0.308 

0.237 

0.177 

0.413 

0.483 

0.411 

0.447 

0.346 

0.485 

0.264 

0.303 

0.167 

52.270 

KX834352 

0.279 

0.308 

0.237 

0.176 

0.413 

0.482 

0.411 

0.447 

0.346 

0.484 

0.265 

0.304 

0.166 

52.320 

KY078906 

0.279 

0.308 

0.236 

0.177 

0.413 

0.484 

0.411 

0.447 

0.343 

0.486 

0.261 

0.306 

0.167 

52.080 

MF642324 

0.280 

0.306 

0.238 

0.177 

0.414 

0.476 

0.412 

0.444 

0.355 

0.472 

0.273 

0.306 

0.169 

52.780 

MF642323 

0.279 

0.306 

0.238 

0.176 

0.415 

0.475 

0.413 

0.444 

0.356 

0.472 

0.274 

0.305 

0.169 

52.770 

MF642322 

0.279 

0.306 

0.238 

0.177 

0.415 

0.477 

0.414 

0.445 

0.355 

0.472 

0.273 

0.306 

0.169 

52.830 

MF642325 

0.278 

0.307 

0.237 

0.178 

0.415 

0.478 

0.412 

0.445 

0.355 

0.472 

0.272 

0.305 

0.171 

52.830 

KP757890 

0.279 

0.305 

0.239 

0.177 

0.416 

0.479 

0.414 

0.447 

0.354 

0.475 

0.270 

0.306 

0.170 

52.460 

Average 

0.280 

0.304 

0.239 

0.177 

0.416 

0.478 

0.412 

0.445 

0.358 

0.468 

0.275 

0.307 

0.171 

52.630 

SD 

0.001 

0.001 

0.001 

0.001 

0.001 

0.003 

0.001 

0.002 

0.005 

0.005 

0.005 

0.002 

0.002 

0.253 


(Stamatakis, 2014) and MrBayes (v3.2.7) (Ronquist et al., 2012) using 
non recombinant sequences. The GTR + Gamma substitution model 
was used to reconstruct the ML tree with a total of 1000 bootstraps. For 
the Bayesian inference (BI) tree, 1000,000 generations were run, with 
the first 25% of burn in. The final trees were displayed in Figtree 
(vl.4.4) (http://tree.bio.ed.ac.uk/software/figtree/). 

2.3. Principal component analysis (PCA) 


mathematical transformation process that converts the relevant vari¬ 
ables (dependent on the relative synonymous codon usage (RSCU) va¬ 
lues) into a smaller number of irrelevant variables (called the principal 
components). Every coding sequence was split into a 59-dimensional 
vector, and each dimension represented the matching dedication of the 
RSCU values of 59 different synonymous codons, which included only a 
specific amino group, without AUG, UGG and the three stop codons. 
The parameters used for the PCA were calculated in program Codon W 
(http://codonw.sourceforge.net/). 


To study the relationship between the multivariate and sample, a 
multidimensional statistical method, PCA, was applied. PCA is mainly a 
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Table 2 

The relative synonymous codon usage (RSCU) of the S gene and complete genomes of PDCoV strains. The numbers in bold denote the eighteen abundant codons of 
three genotypic groups and all sequences. 



China (S) 

Thailand &Early China 
&Vietnam (S) 

USA & Japan & 
Korea (S) 

All(S) 

China (complete 
genome) 

Thailand &Early China & 
Vietnam (complete genome) 

USA &Japan & Korea 
(complete genome) 

All(complete 

genome) 

UUU(F) 

1.29 

1.21 

1.24 

1.25 

1.22 

1.21 

1.23 

1.22 

UUC(F) 

0.71 

0.79 

0.76 

0.75 

0.78 

0.79 

0.77 

0.78 

UUA(L) 

1.04 

0.84 

1.04 

1 

0.77 

0.76 

0.79 

0.79 

UUG(L) 

0.73 

0.71 

0.74 

0.73 

0.86 

0.84 

0.87 

0.87 

CUU(L) 

1.83 

1.89 

1.79 

1.82 

1.99 

1.96 

1.96 

1.96 

CUC(L) 

1.04 

1.03 

1.04 

1.04 

1.03 

1.02 

1.05 

1.04 

CUA(L) 

0.79 

0.83 

0.8 

0.8 

0.65 

0.7 

0.64 

0.65 

CUG(L) 

0.57 

0.69 

0.6 

0.61 

0.69 

0.72 

0.69 

0.7 

AUU(I) 

1.53 

1.51 

1.51 

1.52 

1.54 

1.53 

1.53 

1.53 

AUC(I) 

0.64 

0.63 

0.66 

0.65 

0.73 

0.74 

0.75 

0.74 

AUA(I) 

0.83 

0.87 

0.82 

0.83 

0.73 

0.73 

0.73 

0.73 

GUU(V) 

2.13 

2.13 

2.15 

2.14 

1.79 

1.79 

1.77 

1.77 

GUC(V) 

0.67 

0.72 

0.65 

0.67 

0.72 

0.73 

0.73 

0.73 

GUA(V) 

0.77 

0.79 

0.71 

0.74 

0.77 

0.76 

0.76 

0.76 

GUG(V) 

0.43 

0.35 

0.5 

0.45 

0.72 

0.72 

0.74 

0.74 

UCU(S) 

2.11 

1.94 

1.95 

1.99 

1.95 

1.91 

1.91 

1.91 

UCC(S) 

0.87 

0.97 

1.09 

1 

0.71 

0.73 

0.75 

0.75 

UCA(S) 

1.25 

1.34 

1.29 

1.29 

1.49 

1.57 

1.48 

1.49 

UCG(S) 

0.29 

0.3 

0.26 

0.28 

0.36 

0.32 

0.38 

0.37 

AGU(S) 

0.78 

0.77 

0.71 

0.74 

0.98 

0.95 

0.97 

0.97 

AGC(S) 

0.69 

0.68 

0.71 

0.7 

0.51 

0.52 

0.52 

0.52 

CCU(P) 

1.89 

1.9 

1.95 

1.92 

1.59 

1.61 

1.62 

1.61 

CCC(P) 

0.67 

0.84 

0.69 

0.71 

0.65 

0.64 

0.63 

0.63 

CCA(P) 

1.04 

0.88 

0.98 

0.98 

1.43 

1.42 

1.45 

1.44 

CCG(P) 

0.4 

0.38 

0.39 

0.39 

0.33 

0.33 

0.31 

0.31 

ACU(T) 

1.89 

2.03 

1.95 

1.95 

1.68 

1.75 

1.69 

1.7 

ACC(T) 

0.96 

0.84 

0.9 

0.91 

1 

0.93 

0.98 

0.97 

ACA(T) 

0.98 

0.98 

0.96 

0.97 

1.05 

1.05 

1.05 

1.05 

ACG(T) 

0.17 

0.14 

0.2 

0.18 

0.27 

0.28 

0.28 

0.28 

GCU(A) 

1.38 

1.47 

1.39 

1.4 

1.72 

1.75 

1.73 

1.73 

GCC(A) 

0.76 

0.74 

0.77 

0.76 

0.67 

0.66 

0.66 

0.66 

GCA(A) 

1.76 

1.67 

1.79 

1.76 

1.3 

1.28 

1.3 

1.3 

GCG(A) 

0.09 

0.12 

0.05 

0.08 

0.31 

0.31 

0.31 

0.31 

UAU(Y) 

1.12 

1.18 

0.99 

1.07 

1.07 

1.08 

1.06 

1.06 

UAC(Y) 

0.88 

0.82 

1.01 

0.93 

0.93 

0.92 

0.94 

0.94 

CAU(H) 

0.95 

0.97 

0.96 

0.96 

1.15 

1.19 

1.16 

1.16 

CAC(H) 

1.05 

1.03 

1.04 

1.04 

0.85 

0.81 

0.84 

0.84 

CAA(Q) 

0.97 

1.02 

0.94 

0.96 

0.98 

1.01 

0.96 

0.96 

CAG(Q) 

1.03 

0.98 

1.06 

1.04 

1.02 

0.99 

1.04 

1.04 

AAU(N) 

1.15 

1.16 

1.19 

1.17 

1.08 

1.06 

1.08 

1.08 

AAC(N) 

0.85 

0.84 

0.81 

0.83 

0.92 

0.94 

0.92 

0.92 

AAA(K) 

1.12 

1.16 

1.1 

1.12 

0.93 

0.96 

0.94 

0.94 

AAG(K) 

0.88 

0.84 

0.9 

0.88 

1.07 

1.04 

1.06 

1.06 

GAU(D) 

1.19 

1.23 

1.16 

1.18 

1.1 

1.14 

1.1 

1.1 

GAC(D) 

0.81 

0.77 

0.84 

0.82 

0.9 

0.86 

0.9 

0.9 

GAA(E) 

1.03 

1.06 

1.03 

1.04 

0.96 

0.93 

0.96 

0.96 

GAG(E) 

0.97 

0.94 

0.97 

0.96 

1.04 

1.07 

1.04 

1.04 

UGU(C) 

1.35 

1.3 

1.28 

1.31 

1.14 

1.15 

1.12 

1.12 

UGC(C) 

0.65 

0.7 

0.72 

0.69 

0.86 

0.85 

0.88 

0.88 

CGU(R) 

1.36 

1.14 

1.26 

1.26 

1.72 

1.69 

1.75 

1.75 

CGC(R) 

0.52 

0.73 

0.56 

0.58 

1.14 

1.18 

1.13 

1.13 

CGA(R) 

0.43 

0.41 

0.43 

0.43 

0.5 

0.49 

0.48 

0.48 

CGG(R) 

0.87 

0.81 

0.83 

0.84 

0.53 

0.51 

0.53 

0.53 

AGA(R) 

1.96 

1.83 

1.96 

1.93 

1.36 

1.34 

1.36 

1.36 

AGG(R) 

0.86 

1.08 

0.97 

0.96 

0.76 

0.79 

0.75 

0.76 

GGU(G) 

1.64 

1.71 

1.64 

1.65 

1.87 

1.88 

1.88 

1.88 

GGC(G) 

1.65 

1.57 

1.71 

1.66 

1.03 

1.03 

1.03 

1.03 

GGA(G) 

0.55 

0.59 

0.59 

0.58 

0.87 

0.85 

0.86 

0.86 

GGG(G) 

0.16 

0.13 

0.07 

0.11 

0.23 

0.24 

0.23 

0.23 


2.4. Compositional and principal parameters analysis 

The compositional characteristics of the PDCoV coding sequences of 
the S gene and complete genomes, were calculated. The frequency of all 
nucleotides (GC%, AU%, A%, U%, G% and C%) was estimated using 
BioEdit (http://www.softpedia.com/get/Science-CAD/BioEdit.shtml). 
The A, C, G, and U frequencies in synonymous codons at different sites 
(GC1%, GC2%, GC3%, GC12%, A3%, U3%, G3%, C3%, AU3%) of each 
sequence were computed using CUSP (http://emboss.toulouse.inra.fr/ 
cgi-bin/emboss/cusp) and Codon W (http://codonw.sourceforge.net/). 


2.5. Relative dinucleotide abundance analysis 

The relative dinucleotides abundances were computed according to 
a previously reported method (Karlin and Burge, 1995). The odds ratio 
of the ability of the observed frequencies of the 16 dinucleotides was 
computed using the equation below: 
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Fig. 3. Dinucleotide abundancy of the PDCoV S gene (A) and the complete coding genomes (B). 


Table 3 

Relative dinucleotides frequencies among different groups of S gene and complete genomes of PDCoV strains. 

China (S) Thailand &Early USA &Japan & All (S) China (complete Thailand &Early China USA &Japan & Korea All (complete 

China &Vietnam (S) Korea (S) genome) &Vietnam (complete (complete genome) genome) 

genome) 


AA 

0.938 

+ 

0.009 

0.947 

+ 

0.01 

0.931 

+ 

0.004 

0.936 

+ 

0.009 

1.004 

+ 

0.002 

1.01 ± 0.003 

1.004 

+ 

0.001 

1.005 

+ 

0.002 

AC 

1.246 

+ 

0.012 

1.212 

+ 

0.011 

1.244 

+ 

0.011 

1.241 

+ 

0.018 

1.18 : 

t 0.004 

1.175 ± 0.004 

1.178 

+ 

0.003 

1.178 

+ 

0.003 

AG 

1.0815 ± 0.007 

1.073 

+ 

0.007 

1.077 

+ 

0.007 

1.079 

+ 

0.006 

1.048 

+ 

0.002 

1.043 ± 0.003 

1.046 

+ 

0.001 

1.046 

+ 

0.002 

AU 

0.815 

± 

0.007 

0.838 

+ 

0.006 

0.822 

+ 

0.009 

0.821 

+ 

0.011 

0.819 

+ 

0.003 

0.821 ± 0.005 

0.821 

+ 

0.002 

0.821 

+ 

0.003 

CA 

1.292 

± 

0.011 

1.274 

+ 

0.02 

1.293 

+ 

0.011 

1.291 

+ 

0.016 

1.219 

+ 

0.006 

1.213 ± 0.008 

1.223 

+ 

0.003 

1.222 

+ 

0.005 

CC 

0.784 

+ 

0.015 

0.806 

+ 

0.011 

0.792 

+ 

0.006 

0.788 

+ 

0.015 

0.891 

+ 

0.005 

0.888 ± 0.008 

0.887 

+ 

0.003 

0.888 

+ 

0.004 

CG 

0.513 

+ 

0.011 

0.506 

+ 

0.013 

0.514 

+ 

0.004 

0.514 

+ 

0.011 

0.591 

+ 

0.005 

0.59 ± 0.004 

0.59 ± 0.002 

0.59 ± 0.003 

CU 

1.184 

+ 

0.016 

1.188 

+ 

0.009 

1.176 

+ 

0.01 

1.18 : 

t 0.011 

1.155 

+ 

0.002 

1.162 ± 0.005 

1.154 

+ 

0.002 

1.155 

+ 

0.003 

GA 

0.853 

± 

0.01 

0.855 

+ 

0.01 

0.867 

+ 

0.008 

0.861 

+ 

0.011 

0.928 

+ 

0.001 

0.925 ± 0.002 

0.928 

+ 

0.001 

0.928 


0.002 

GC 

1.171 

± 

0.013 

1.177 

+ 

0.019 

1.166 

+ 

0.005 

1.17 : 

t 0.012 

1.106 

+ 

0.004 

1.115 ± 0.001 

1.108 

+ 

0.002 

1.108 


0.002 

GG 

0.983 

± 

0.018 

0.996 

+ 

0.033 

0.976 

+ 

0.014 

0.981 

+ 

0.019 

0.93 : 

t 0.003 

0.926 ± 0.012 

0.929 

+ 

0.002 

0.929 

+ 

0.005 

GU 

1.012 

± 

0.009 

0.997 

+ 

0.01 

1.003 

+ 

0.006 

1.005 

+ 

0.01 

1.028 

+ 

0.003 

1.026 ± 0.002 

1.028 

+ 

0.001 

1.028 

+ 

0.002 

UA 

0.915 

± 

0.009 

0.919 

+ 

0.011 

0.908 

+ 

0.006 

0.912 

+ 

0.011 

0.867 

+ 

0.005 

0.868 ± 0.007 

0.864 

+ 

0.002 

0.864 

+ 

0.003 

UC 

0.841 

± 

0.015 

0.852 

+ 

0.012 

0.838 

+ 

0.007 

0.842 

+ 

0.012 

0.849 

+ 

0.001 

0.85 ± 0.003 

0.852 

+ 

0.002 

0.852 

+ 

0.002 

UG 

1.316 

± 

0.011 

1.322 

+ 

0.01 

1.326 

+ 

0.006 

1.321 

+ 

0.008 

1.333 

+ 

0.004 

1.34 ± 0.003 

1.335 

+ 

0.002 

1.336 

+ 

0.002 

UU 

1.02 ± 0.015 

1.004 

+ 

0.021 

1.028 

+ 

0.006 

1.02 : 

t 0.014 

1.027 

+ 

0.003 

1.02 ± 0.004 

1.025 

+ 

0.002 

1.025 

+ 

0.002 


where the frequency of nucleotide X is represented by f x , the frequency 
of nucleotide Y is represented by f y , the expected frequency of the di¬ 
nucleotide XY is represented by fyf*, and the frequency of the dinu¬ 
cleotide XY is represented by As an universal standard, for < 0.78 
or xy > 1.23, we considered that the XY pair was under-represented or 
over-represented respectively, compared with the random association 
of single nucleotides and according to its relative abundance (Butt 
et al., 2016). 


2.6. Relative synonymous codon usage (RSCU) 

RSCU refers to the relative probability of a specific synonymous 
codon, which indicates whether the codon usage is influenced by the 
amino acid composition. In the case where all synonymous codons of a 
particular amino acid are assumed to be used equally, the RSCU value 
of a sequence is the ratio of the frequency at which the codon is actually 
observed at its expected frequency (Chen and Chen, 2014). The RSCU is 
calculated as: 


RSCU = 



ni 


where gy is the derived value of the ith codon for the jth amino acid 
with nj kinds of synonymous codons. RSCU values = 1.0, > 1.0, 
and < 1.0, represent no bias, positive codon usage bias, and negative 
codon usage bias, respectively. The RSCU was calculated using MEGA7 

(https://www.megasoftware.net/). 


2.7. Effective number of codons (ENC) analysis 


The degree of codon usage bias, measured by the ENC, was esti¬ 
mated taking into account the number of amino acids and the gene 
length. The ENC values vary between 20 and 61, with values closer to 
20 indicating a high codon usage bias and values closer to 61 indicating 
a low codon usage bias. The ENC value can reflect the preference of a 
synonymous codon in a family of codons. Highly expressed genes often 
show a high codon usage bias, whereas poorly expressed genes contain 
more rare codons and thus a lower codon usage bias. Generally, the 
codon usage is considered to show strong bias when the ENC value is 
less than or equal to 45 (Comeron and Aguade, 1998). We used the 
following equation to calculate the ENC (Fuglsang, 2006): 


9 15 

ENC = 2 + —— 1- —— I- —— h 

F 2 f 3 f 4 


3 

p 6 


where the average value of Fj (i = 2, 3, 4, 6) for the i-fold degenerate 
amino acids is represented by F. The following equation was used to 
calculate F; values: 



6 


where the total number of appearances of the codons for that amino 
acid is represented by n and the total number of appearances of the jm 
codon for that amino acid is represented by nj. 
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GC3s 


B 



Fig. 4. ENC-plot analysis (GC3s plotted against ENC) of the PDCoV S gene (A) and complete coding genomes (B). The China, USA-Japan-Korea, and Thailand-Early 
China-Vietnam groups are represented in light blue, green, and pink, respectively. 


2.8. ENC-plot analysis 

ENC-plot analysis is commonly used to determine the factors in¬ 
fluencing the codon usage bias (i.e. mutation pressure). The ENC values 
relative to the GC3 values (the frequency of guanine or cytosine at the 
third codon position of synonymous codons excluding Met, Trp and 
stop codons) were plotted (Karlin and Burge, 1995). When the codon 
usage is limited only to the GC3 mutation, the expected ENC value falls 
on a theoretical curve (the functional relationship between the ENC 
expectation curve and the GC3 value). When the actual ENC-plot values 
of these sequences are lower than the standard curve, it is suggestive of 
natural selection playing a role in driving codon usage bias (Fuglsang, 
2008). The theoretical ENC values in ENC-plot analysis were calculated 
as follows. 

29 

ENC ex p ec ted — 2 + s+ — — 

s 2 + (1 - s) 2 

where s denotes the frequency of C or G at the synonymous codons third 
position (i.e. GC3). 

2.9. Neutrality plot analysis 

Neutrality analysis or neutrality evolution analysis was carried out 
to compare and define the effect of natural selection and mutation 
pressure on the PDCoV codon usage patterns by comparing the value of 
GC12s of synonymous codons with the GC3s value using diagonal 


analysis. In the graph, the plot regression coefficient is considered as 
the mutation selection balance coefficient, and the evolutionary rates 
caused by natural selection pressure and mutation pressure are re¬ 
presented by the slope of the regression line. If all points are distributed 
along the diagonal and there is no significant difference in the three 
codon positions, this indicates that there is only weak or no external 
selection pressure. However, if the regression curve is parallel or tilted 
to the horizontal axis, this would indicate that the correlation between 
the changes of GC12 and GC3 is very low. Thus, the regression curve 
shows that the effect of natural selection evolution effectively balances 
the degree of neutrality (Kumar et al., 2016). 

2.10. Parity rule 2 (PR2) analysis 

PR2 analysis was used to investigate the effect of selection and 
mutation pressure on gene codon usage. PR2 is a gene map with AU 
deviation [A3/ (A3 + U3)] as the ordinate and GC deviation [G3/ 
(G3 + C3)] as the abscissa. At the center of the graph, the values of the 
two coordinates are 0.5, which means that G = C and A = U (PR2), and 
there is no deviation between the mutation effect and selectivity 
(substitution rate) (Sueoka, 1996). 
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3. Results 


China —- USA&Japan&Korea — Thailand&Early China&Vietnam 
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Fig. 5. Neutrality analysis (GC12 against GC3) of the PDCoV S gene (A) and 
complete coding genomes (B). The China, USA-Japan-Korea and Thailand-Early 
China-Vietnam groups are represented in light blue, green, and pink, respec¬ 
tively. 



3.1. Recombination and phylogenetic analysis 

After removal of recombinant sequences, 132 S gene and 64 com¬ 
plete genomes were left for further analysis. Phylogenetic analysis of S 
gene based on ML (Fig. 1A) and BI (Fig. IB) trees revealed three in¬ 
dividual PDCoV groups including, China, USA-Japan-Korea, and Thai- 
land-Early-China-Vietnam groups. We then used these three groups to 
investigate into codon usage and associations. 

3.2. Principle component analysis (PCA) 

PCA showed that the three groups clustered separately, especially 
the USA-Japan-Korea group, although several overlaps existed between 
the USA-Japan-Korea and Thailand-Early China-Vietnam groups 
(Fig. 2). For whole genomes, the three groups clustered separately too, 
except for several overlaps between the USA-Japan-Korea and the 
Thailand-Early China-Vietnam groups. 

3.3. Nucleotide composition of PDCoV S gene and complete genomes 

The nucleotide U was the most abundant in the S gene, followed by 
A, C and G, regardless of the individual phylogenetic group (Table 1). 
The detailed information of the nucleotide composition is shown in 
Table S2. The nucleotide composition of synonymous codons at the 
third position of (A3, C3, G3, U3) showed that the frequencies of U3 
and A3 were higher than C3 and G3. The percentage content of AU and 
GC were indicative of AU-rich component in the coding sequences of 
PDCoV. Analysis of the synonymous codons at the first, second and 
third position showed that the values of GC1 were the highest, followed 
by GC2 and GC3 (Table S2). The same pattern was identified for whole 
genomes. Overall, these results illustrated that a relatively large part of 
the PDCoV coding sequence comprises A and U nucleotides. 

3.4. PDCoV relative synonymous codon usage 

All of the PDCoV 18 optimal synonymous codons for the corre¬ 
sponding amino acids of the S gene ended with U (Perez-Rivera et al., 
2019) (Table 2). A total of 7 of the 18 priority codons had RSCU values 
greater than 1.6 (CUU (L), GUU (V), UCU (S), CCU (P), ACU (T), AGA 
(R), and GGC (G)). However, the remaining codons had RSCU values 
less than 1.6, with no underrepresented codons observed within the 
preferred codons. For whole genomes, U-ended codons were also the 
preferred codons among the 18 most abundant synonymous codons 
(Table 2). The RSCU analyses and the nucleotide composition revealed 
that the compositional constraints (the nucleotides U in this case) had 
the most influence on the selection of the preferred codons. 


B 


0.65 


3.5. Factors driving dinucleotide frequency abundance 


3. 0.35 


0.45 

0.45 


0.55 


0.65 


0.35 J 

G3/(G3+C3) 

Fig. 6. Parity Rule 2 (PR2)-bias plot [A3/(A3 + U3) against G3/(G3 + C3)]. 
The PR2 bias plot was calculated for the S gene (A) and complete coding 
genomes (B). The China, USA-Japan-Korea, and Thailand-Early China-Vietnam 
groups are represented in light blue, green, and pink, respectively. 


The relative abundances of the 16 dinucleotides of PDCoV coding 
sequences were calculated. We found that dinucleotides were not pre¬ 
sent randomly. None of the dinucleotide relative abundance values 
corresponded to the theoretical frequency (i.e., 1.0) (Fig. 3, Table 3). 
Furthermore, in the S gene, CpA (1.29 ± 0.0016) and UpG 
(1.32 ± 0.008) showed different degrees (marginal or peripheral) of 
overrepresentation. Only CpG (0.514 ± 0.011) was underrepresented. 
For whole genomes, the overrepresented and underrepresented dinu¬ 
cleotides were UpG (1.34 ± 0.002) and CpG (0.59 ± 0.003), respec¬ 
tively. 

3.6. ENC analysis 
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ENC values were estimated to evaluate the extent of codon usage 
deviation within coding sequences of different PDCoV isolates. This 
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analysis showed that PDCoV coding sequences were relatively con¬ 
served and stable in terms of the S coding sequences or whole genomes 
with a low codon usage bias. The ENC values of the S coding sequences 
ranged from 52.71 to 52.97, with an average of 52.853 (ENC > 40) 
(Table 1). The ENC values of complete genome coding sequences were 
also within the range of the S gene, with no obvious difference in re¬ 
lation to phylogenetic groups. 

3.7. Influence of mutation pressure on the PDCoV codon usage pattern 

ENC-plot analysis was carried out to reveal the constraint of mu¬ 
tation pressure on the PDCoV codon usage pattern. The values of GC3 
were plotted against the ENC values according to individual phyloge¬ 
netic group. We found that all points regardless of group concentrated 
on the left side and near to the expected curve for the S gene (Fig. 4A). 
For whole genome coding sequences, all the points were also under but 
close to the standard curve (Fig. 4B). 

3.8. Influence of natural selection on the PDCoV codon usage pattern 

Here, neutrality analysis or diagonal analysis was used, between the 
GC3s and GC12s values, to judge the effects of natural selection and 
mutation pressure (Fig. 5). In the S gene, the relationships between 
GC3s and GC12s were calculated based on the three phylogenetic 
groups. The correlation coefficient in the USA-Japan-Korea group, 
China group, and Thailand-Early China-Vietnam group were the 
0.2017 ± 0.3707, 0.143 ± 0.3942, and 0.1142 ± 0.4873, respec¬ 
tively. Thus, the percentages of constrain of natural selection were 
79.83%, 85.7%, and 88.58% for the S gene (Fig. 5A). For whole gen¬ 
omes, GC12s and GC3s significantly correlated, with a correlation 
coefficient of 0.1897 ± 0.387 according to the USA-Japan-Korea 
group, indicating an 81.03% limit for natural selection or 18.97% of 
GC3 relative binding (100% neutral or 0% constraint) (Fig. 5B). 
Overall, the above results indicate that the effect of mutation pressure is 
in all codon positions, but natural selection plays a major role driving 
the codon usage bias of PDCoV. Considering the limited number of 
sequences in the China and Thailand-Early China-Vietnam groups, they 
were excluded from the results. 

In addition, PR2 analysis was carried out (Fig. 6). We found that the 
A U, C ^ G, for both the S gene and whole genomes, which indicates 
the inequivalent role of mutation pressure and natural selection in 
shaping the codon usage of PDCoV. 

4. Discussion 

PDCoV is an emerging coronavirus that infects the whole of the 
small intestine, especially the jejunum and ileum, causing severe en¬ 
teritis, diarrhea, and vomiting in piglets. PDCoV was first discovered in 
Hong Kong, China in 2012 (Woo et al., 2012). At the beginning of 2014, 
PDCoV was first reported in the USA, after which at least 17 USA states 
confirmed its presence as of December 2014. In recent years, China, 
South Korea, Thailand, and other Asian countries have suffered from 
recurrent outbreaks (Lorsirigool et al., 2016; Janetanakit et al., 2016; 
Dong et al., 2015; Lee et al., 2016). Phylogenetic analysis is well studied 
to demonstrate the evolution of virus (He et al., 2018; Li et al., 2018a; 
Su et al., 2017, 2016) Here, we first analyzed the codon usage patterns 
of the S coding sequences, as well as whole genome coding sequences of 
PDCoVs isolated from around the world to determine the factors driving 
codon usage, and provided a comprehensive understanding of the 
characteristics and evolution of PDCoV whole coding genes. 

Phylogenetic analysis of the S gene revealed that sequences clus¬ 
tered into three different groups similarly to a previous study (Zhang 
et al., 2019), but with more accuracy since more methods were applied 
and recombinant sequences were excluded. Additionally, PCA analysis 
also indicated three potential evolutionary groups. 

Based on the S coding gene and complete coding genomes, we found 


a significant preference for A and U nucleotides, rather than G and C. 
The contents of AU and GC were not equal and were more inclined 
towards the usage of AU nucleotides. If the use of a synonymous codon 
was affected only by mutation pressure, the frequency of U and A nu¬ 
cleotides in the third codon position should be equal to the frequency of 
G and C (van Hemert et al., 2016). Thus, we can conclude that there 
was a low bias in the usage of nucleotides in all PDCoV strains. RSCU 
analysis revealed that PDCoV genomes have a tendency towards U- 
ending codons. In addition, the relative probability distribution of 16 
dinucleotides showed that codons and dinucleotides were used unequal 
and followed certain rules. Dinucleotide abundance influences the 
codon usage bias in certain organisms, including RNA and DNA viruses 
(Rothberg and Wimmer, 1981). Dinucleotide sequences may be derived 
from odd partial of amino acid changes or codon usage bias; therefore, 
we analyzed dinucleotide composition distribution (Plotkin et al., 2004; 
Cristina et al., 2015). The translational selection pressure on a dinu¬ 
cleotide is the entropy cost of a given set of constraints that alter the 
number of dinucleotide occurrences, in this case the amino acid se¬ 
quence of the given protein sequence and the cost of the codon usage 
bias (Cristina et al., 2015). Analyses of the frequencies of codons and 
dinucleotides revealed that translation selection also played a part in 
the codon usage of PDCoVs. These initial observations prompted further 
investigation to assess the extent of codon usage bias using ENC ana¬ 
lyses. For PDCoV, the ENC value based on the S gene or complete 
coding genomes was 52, indicative of slight bias and that different 
PDCoVs are relatively conserved and stable. Previous studies indicated 
that ENC values correlate negatively with gene expression (van Hemert 
and Berkhout, 2016). Thus, a higher ENC value indicates lower gene 
expression and lower codon preference. A low codon bias could be 
explained by the need to better adapt towards efficient replication and 
survival in the host, and to reduce the energy required for virus bio¬ 
synthesis while avoiding competition with host protein synthesis (van 
Hemert et al., 2016). When the ENC and GC3 values of PDCoVs were 
plotted, mutation pressure was revealed as a moderate factor influen¬ 
cing the PDCoV codon usage pattern. According to previous reports, 
both natural selection and mutation pressure can affect the ENC value, 
which indicates that the relative contribution of selection and mutation 
on the codon usage pattern are not robust (Chen et al., 2014; Gu et al., 
2004). It is worth mentioning that the codon usage bias of species with 
A/U biased genomes is different from that of genomes with a G/C bias. 
Therefore, simple ENC-GC3 map analysis might be misleading. Gen¬ 
erally, mutation pressure will always have a role in driving the codon 
usage of viruses. Here, using neutrality plots we found that natural 
selection was a more dominant factor compared with mutation pressure 
(Shi et al., 2013). Natural selection can lead to weak codon usage bias 
while the virus is trying to adapt to the host cells (Matsumoto et al., 
2016). PR2 bias plot analysis showed that both natural selection and 
mutation pressure contributed to the observed codon bias consistent 
with the neutrality analysis. 

In summary, we found that the codon usage of the S gene was si¬ 
milar to the complete coding genome. To open new perspectives, a 
further exploration of the function and features of functional genes is 
worth studying. 

5. Conclusion 

Here, we found that, to a large extent, the codon usage pattern and 
the sequences characteristics of PDCoVs were restricted by evolutionary 
processes. Briefly, PDCoV has a low codon usage bias, which was af¬ 
fected by natural selection, mutation pressure, and dinucleotide abun- 
dancy. The primary element affecting the PDCoV codon usage pattern 
was natural selection. Additionally, the results of PCA and phylogenetic 
analysis were highly consistent suggesting that the codon usage pattern 
study can reveal the evolutionary clustering relationship between 
strains based on their genetic composition. This study suggests that 
monitoring the updated sequences of this novel, emerging virus would 
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provide clues to better understand viral evolution and the disease. 
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